Safelog: Supporting Web Search and Mining by Differentially-Private Query Logs
نویسندگان
چکیده
Query logs can be very useful for advancing web search and web mining research. Since these web query logs contain private, possibly sensitive data, they need to be effectively anonymized before they can be released for research use. Anonymization of query logs differs from that of structured data since they are generated based on natural language and the vocabulary (domain) is infinite. This unstructured, unbounded data set poses different challenges for producing privacy-preserving query logs. To mitigate these challenges, we propose using a differential privacy framework called Safelog to generate anonymized query logs that contain sufficient contextual information to allow existing web search and web mining algorithms to use the data and attain meaningful results. The key to achieving high privacy guarantees is the introduction of a query pool for augmenting the query log during the sanitation process. We empirically validate the effectiveness of our framework for generating usable, privacy preserving logs for web search, and demonstrate that it is possible to maintain high utility for this task while guaranteeing sufficient levels of privacy. We conclude with a discussion of other web mining tasks that can be supported by these anonymized logs and show some preliminary results for the task of website clustering.
منابع مشابه
Towards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore
Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...
متن کاملQuery Architecture Expansion in Web Using Fuzzy Multi Domain Ontology
Due to the increasing web, there are many challenges to establish a general framework for data mining and retrieving structured data from the Web. Creating an ontology is a step towards solving this problem. The ontology raises the main entity and the concept of any data in data mining. In this paper, we tried to propose a method for applying the "meaning" of the search system, But the problem ...
متن کاملEnhancing Web Search through Query Log Mining
INTRODUCTION Web query log is a type of file keeping track of the activities of the users who are utilizing a search engine. Compared to traditional information retrieval setting in which documents are the only information source available, query logs are an additional information source in the Web search setting. Based on query logs, a set of Web mining techniques, such as log-based query clus...
متن کاملQuery expansion based on relevance feedback and latent semantic analysis
Web search engines are one of the most popular tools on the Internet which are widely-used by expert and novice users. Constructing an adequate query which represents the best specification of users’ information need to the search engine is an important concern of web users. Query expansion is a way to reduce this concern and increase user satisfaction. In this paper, a new method of query expa...
متن کاملQuery Segmentation for Web Search
This paper describes a query segmentation method for search engines supporting inverse lookup of words and phrases. Data mining in query logs and document corpora is used to produce segment candidates and compute connexity measures. Candidates are considered in context of the whole query, and a list of the most likely segmentations is generated, with each segment attributed with a connexity val...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016